A Framework for Word Spotting In Scanned Urdu Documents by Exploiting the Dot Orientation

نویسندگان

Muhammad Shafi

Faisal Iqbal

Iftikhar Ahmed Khan

Muhammad Irfan Khattak

Mohammad Saleem

Naeem Khan

چکیده

Urdu is one of the most widely used languages in the world and there is a need of developing character recognition and word-spotting algorithms, so that Urdu literature can be made easily accessible and searchable to the Urdu reading population. Although there has been a sizeable research for character recognition, very few articles have been published for word-spotting in Urdu language. Unlike English language (with only two alphabets with dots), in Urdu language 17 out of 38 alphabets have dots either above or beneath them. This paper presents a data reduction framework, based on exploiting the dot orientation for word spotting in Urdu scanned documents. After applying the proposed scheme, the number of eligible candidates for the target word is greatly reduced. As demonstrated in the Results and Analysis section, the proposed algorithm has shown promising results with an average data reduction rate of 79.8%. [Muhammad Shafi, Faisal Iqbal, Iftikhar Ahmed Khan, Muhammad Irfan Khattak, Mohammad Saleem, Naeem Khan. A Framework for Word Spotting In Scanned Urdu Documents by Exploiting the Dot Orientation. Life Sci J 2013; 10(7s): 1163-1171]. (ISSN: 1097-8135). http://www.lifesciencesite.com 185

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

A Survey on Various Word Spotting Techniques for Content Based Document Image Retrieval

Searching documents for information and retrieval of relevant documents is a basic activity. Various tools are readily available for searching and retrieval from digital documents, but not much robust methods are available for retrieval from historic documents and old manuscripts as they are not digitized but available in scanned formats. Conventional way of retrieval from scanned document imag...

متن کامل

Politeness Orientation in Social Hierarchies in Urdu

The present research is aimed at investigating how the politeness of the speakers of Urdu is influenced by their relative social status in society. The researcher took politeness theory of Brown and Levinson (1978, 1987) as a model. To observe politeness of Urdu speakers, speech act of apology with different strategies was selected. A Discourse Completion Task (DCT) was used as an instrument to...

متن کامل

Spotting words in handwritten Arabic documents

The design and performance of a system for spotting handwritten Arabic words in scanned document images is presented. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user types in a query in English within a search window, the system finds the equivalent Arabic word, e.g., by dictionary look-up, locates word images in an inde...

متن کامل

Word Spotting in Scanned Tamil Land Documents using K-Nearest Neighbor

word spotting is a technique which can extract the text from input image. Here, we implemented on scanned Tamil land documents. Using Gabor feature, we extract the feature values for the input image. The main goal is recognize the text from the document using K nearest neighbor classifier. The features were calculated and the features were combined. Using these features, we can classify and rec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

A Framework for Word Spotting In Scanned Urdu Documents by Exploiting the Dot Orientation

نویسندگان

چکیده

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

A Survey on Various Word Spotting Techniques for Content Based Document Image Retrieval

Politeness Orientation in Social Hierarchies in Urdu

Spotting words in handwritten Arabic documents

Word Spotting in Scanned Tamil Land Documents using K-Nearest Neighbor

عنوان ژورنال:

اشتراک گذاری